Time -frequency analysis of vocal source signal for speaker recognition
نویسندگان
چکیده
This paper investigates the importance of spectrotemporal characteristics of the source excitation signal for speaker recognition. We propose an effective feature extraction technique for obtaining essential timefrequency information from the linear prediction (LP) residual signal, which are closely related to the glottal excitation of individual speaker. With pitch synchronous analysis, wavelet transform is applied to every two pitch cycles of the LP residual signal to generate a new feature vector, called Wavelet Octave Coefficients of Residues (WOCOR), which provides additional speaker discriminative power to the commonly used linear predictive Cepstral coefficients (LPCC). Experimental evaluation over a Cantonese speaker recognition corpus demonstrates the effectiveness of WOCOR for speaker recognition. Recognition tests with WOCOR and LPCC outperforms the conventional methods of using Mel Frequency Cepstral Coefficients (MFCC).
منابع مشابه
Time –Frequency Representation of Vocal Source Signal for Speaker Verification
We propose an effective feature extraction technique for obtaining essential time-frequency information from the linear prediction (LP) residual signal, which are closely related to the glottal vibration of individual speaker. With pitch synchronous analysis, wavelet transform is applied to every two pitch cycles of the LP residual signal to generate a new feature vector, called Wavelet Based F...
متن کاملIntegrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification
This paper describes a speaker identification system that uses complementary acoustic features derived from the vocal source excitation and the vocal tract system. Conventional speaker recognition systems typically adopt the cepstral coefficients, e.g., Mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC), as the representative features. The cepstral fea...
متن کاملComparative Analysis of Discrimination Power of the Vocal Source and Vocal Tract Features for Speaker Verification
The paper comparatively analyzes the speaker discrimination power of the vocal source and vocal tract related features and present a speaker verification system optimally utilizing the source and tract related speaker specific information. A pitchsynchronous wavelet transform is adopted to capture the speaker specific information from the vocal source signal, particularly the Linear Prediction ...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملDown-sampling speech representation in ASR
Features for automatic speech recognition (ASR) are typically sampled at about 100 Hz (10 ms analysis step). Recent experiments indicate that the most e cient components of the modulation spectrum of speech for ASR are up to about 16 Hz [1]. Consequently, RASTA processing attenuates modulation frequencies higher than 16 Hz and should in principle allow for a subsequent down-sampling of the feat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004